Efficient Term Extraction and Indexing Approach in Small-Scale Web Search of Uyghur Language

نویسندگان

  • Turdi Tohti
  • Winira Musajan
  • Askar Hamdulla
چکیده

In order to avoid the frequently read-write of hard disk and to speed up the search, the index should be saving in the memory in the small-scale web search. But, to express the original information by fewer memory spaces, also needs for index compression, and this would increases the computation expenses or brings certain harm to the original information in a way. In this research of Uyghur small-scale web search, in order to speed up the retrieval and query speed, inverted index has established uses Hash table data structure and entirely stay resident in memory. In the aspect of index compression, have not uses any compression technique, but proposed a word grouping approach based on simplified N-gram statistical model ,and extracting semantic words that structurally stable, semantically complete and independent ,and greatly reduces the scale of indexing item list. Thereby, not only served the purpose of index compression, but also solved the ambiguity problem certain extent and improved the search precision obviously. The experimental result indicated that, our method is feasible and effective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

High Fuzzy Utility Based Frequent Patterns Mining Approach for Mobile Web Services Sequences

Nowadays high fuzzy utility based pattern mining is an emerging topic in data mining. It refers to discover all patterns having a high utility meeting a user-specified minimum high utility threshold. It comprises extracting patterns which are highly accessed in mobile web service sequences. Different from the traditional fuzzy approach, high fuzzy utility mining considers not only counts of mob...

متن کامل

Integrating RDF Querying Capabilities into a Distributed Search Infrastructure

The Semantic Web is inherently distributed, and covers both metadata and full-text information. Semantic search therefore can profit a lot from peer-to-peer infrastructures as well as from powerful metadata search functionalities based on full-text search technologies. In this paper we focus on an approach extending an existing P2P search infrastructure with RDF querying capabilities, which bot...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Query-Driven Indexing in Large-Scale Distributed Systems

Efficient and effective search in large-scale data repositories requires complex indexing solutions deployed on a large number of servers. Web search engines such as Google and Yahoo! already rely upon complex systems to be able to return relevant query results and keep processing times within the comfortable sub-second limit. Nevertheless, the exponential growth of the amount of content on the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Multimedia

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013